Whole genome sequencing-based identification of human tuberculosis caused by animal-lineage Mycobacterium orygis

ABSTRACT A recently described member of the Mycobacterium tuberculosis complex (MTBC) is Mycobacterium orygis, which can cause disease primarily in animals but also in humans. Although M. orygis has been reported from different geographic regions around the world, due to a lack of proper identification techniques, the contribution of this emerging pathogen to the global burden of zoonotic tuberculosis is not fully understood. In the present work, we report single nucleotide polymorphism (SNP) analysis using whole genome sequencing (WGS) that can accurately identify M. orygis and differentiate it from other members of the MTBC species. WGS-based SNP analysis was performed for 61 isolates from different provinces in Canada that were identified as M. orygis. A total of 56 M. orygis sequences from the public databases were also included in the analysis. Several unique SNPs in the gyrB, PPE55, Rv2042c, leuS, mmpL6, and mmpS6 genes were used to determine their effectiveness as genetic markers for the identification of M. orygis. To the best of our knowledge, five of these SNPs, viz., gyrB 277 (A→G), gyrB 1478 (T→C), leuS 1064 (A→T), mmpL6 486 (T→C), and mmpS6 334 (C→G), are reported for the first time in this study. Our results also revealed several SNPs specific to other species within MTBC. The phylogenetic analysis shows that the studied genomes were genetically diverse and clustered with M. orygis sequences of human and animal origin reported from different geographic locations. Therefore, the present study provides a new insight into the high-confidence identification of M. orygis from MTBC species based on WGS data, which can be useful for reference and diagnostic laboratories.

T uberculosis is a severe and complex infectious disease caused by the Mycobacte rium tuberculosis complex (MTBC) and remains a public health concern, leading to the death of 1.6 million people per year worldwide, predominantly in low-and middle-income countries (1).Mycobacterium orygis, a species belonging to MTBC, was first described by van Ingen et al. (2).M. orygis is capable of causing infection in both animal and human hosts.This species has received considerable interest in recent years and has been reported to be isolated from dairy cattle and captive monkeys (3), captive wild animals (4), deer (5), rhinoceros (6), black buck (5), and humans (2,7,8).M. orygis infection has been recognized as a zoonotic source of human tuber culosis (9).Moreover, in New Zealand, a presumptive transmission of M. orygis from human to animal was reported, with the original infection being mapped out from contacts with domestic animals in India (10).M. orygis is endemic in Southeast Asian countries, including Bangladesh, India, Nepal, and Pakistan (3,7,10).Although tuberculo sis incidence in Canada is relatively low (11), zoonotic tuberculosis is progressively being recognized as a significant menace to public health.Hence, zoonotic tuberculosis could pose a significant challenge in controlling this disease and meeting global tuberculosis elimination goals (12).
Along with M. orygis, other phylogenetically related MTBC species are M. tuberculo sis, Mycobacterium bovis and its variant vaccine strain Bacille Calmette-Guérin (BCG), Mycobacterium africanum, Mycobacterium caprae, Mycobacterium pinnipedii, Mycobacte rium microti, Mycobacterium canettii, and members of animal-adapted clade A1 such as "Dassie" bacillus, Chimpanzee bacillus, Mycobacterium suricattae, and Mycobacterium mungi (13)(14)(15).To improve human and animal health surveillance, it is important to implement proper identification methods and analysis tools for quickly and accu rately discriminating MTBC species.Probe hybridization-based assays have been used to differentiate the causative agents of tuberculosis; however, studies have reported a limitation of this method in differentiating M. orygis from M. africanum (16).The mutations in genes including gyrB and Rv2042c and the regions of difference (RDs) deletion have been used to discriminate against M. orygis from MTBC species, using PCR-based approaches (2,3,17).Since M. orygis strains share the gyrB 1450 (G→T) mutation with MTBC members, including M. africanum and M. pinnipedii, they may have previously been mislabeled as M. africanum (3,17,18).
Whole genome sequencing (WGS) technologies are now increasingly being used in clinical and research laboratories to investigate tuberculosis surveillance, outbreak detection, antimicrobial resistance prediction, characterization, and diversity of MTBC species (14,19).One of the key challenges for adopting WGS for these applications is data analysis, which requires bioinformatics support and data interpretation (20).Consequently, a number of analytical tools have been developed to detect pathogenic bacterial strains using WGS data, with single nucleotide polymorphism (SNP)-based methods being the most common ones used in public health laboratories (21)(22)(23).BioHansel, for example, performs high-resolution genotyping by detecting phylogeneti cally informative SNPs in WGS data (24).
In order to improve our ability to accurately identify M. orygis and other species within MTBC, whole genome analysis was performed in this investigation.We report the identification of 61 new M. orygis from Canada by WGS-based SNP analysis, and to validate the results, publicly available 56 M. orygis genomes were added to the analysis.We performed molecular marker characterization on these genomes that demonstrated a clear differentiation of M. orygis from members of other MTBC species.Furthermore, the newly sequenced M. orygis isolates were phylogenetically analyzed to determine their diversity and global distribution.This study may improve our understanding of this poorly monitored emerging zoonotic pathogen and address its burden in Canada and globally.

Sample collection and project background
Cultures from across Canada were received at the National Reference Centre for Mycobacteriology (NRCM), National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg for M. tuberculosis testing between 2009 and 2022.The cultures were grown on mycobacteria growth indicator tube media and Middlebrook 7H10 plates using standard and aerobic growth conditions.At NRCM, the presence of the insertion sequence IS6110 and region of difference RD9 was confirmed by reverse transcription PCR analysis to detect MTBC and M. tuberculosis, respectively.Classical methods such as gyrB, RD1, RD4, RD7, and mycobacterial interspersed repetitive unit-variable number of tandem repeat (MIRU-VNTR) were also used for identification and genotyping purposes.These methods can distinguish between mostly isolated MTBC species, M. tuberculosis, M. bovis and its variant BCG, M. caprae, and M. africanum, but they are unable to identify some rare species of the complex, such as M. orygis and M. pinnipedii.Since 2018, our group has started routinely using WGS technologies and the BioHansel (24) program for identification and differentiation between MTBC species.BioHansel performs high-res olution genotyping of MTBC by detecting phylogenetically informative SNPs in WGS data.However, the SNPs in the gyrB gene (at positions 432, 513, 870, 1068, 1167, and 1207) currently incorporated in BioHansel cannot differentiate M. orygis, and the pipeline eventually identifies this species, with low confidence, as a member of an animal lineage of the MTBC (probable M. tuberculosis var.orygis) based on SNP typing (2).We then separated all isolates identified as animal lineage "probable M. orygis, " all isolates with the same MIRU-VNTR profile, and M. africanum, M. pinnipedii, or M. bovis/BCG from the inventory and investigated in this study.Therefore, a total of 137 isolates were interrogated (Supplementary Data S1), of which 61 were identified as M. orygis.Most of the isolates were from British Columbia "BC" (n = 38), followed by Alberta "AB" (n = 19), Manitoba "MB" (n = 2), Nova Scotia "NS" (n = 1), and Saskatchewan "SK" (n = 1).The detailed information (e.g., source province, specimen type, and identification) on M. orygis isolates from this study is depicted in Table 1.

Genomic comparison of the new sequences with the public database
The newly sequenced genomes were compared with 56 publicly available M. orygis genomes reported from around the world, downloaded from the National Center for Biotechnology Information (NCBI) GenBank (31), the NCBI Sequence Read Archive (SRA) (32), and the European Nucleotide Archive (33).The list of reference M. orygis genomes with their accession numbers, collection centers, country of origin, host species, and specimen source is presented in Supplementary Data S2A.Aside from M. orygis, a total of 190 previously published genome sequences (complete/draft/SRA) representing all MTBC species and lineages (L1 to L9), including its animal lineages, were also downloa ded from the above repositories and added to the analysis (Supplementary Data S2B).

Gene selection and R analysis: unique SNPs for M. orygis and other MTBC species/lineages
To develop a simple, quick, effective, and yet powerful identification approach, a set of six genes, viz., gyrB, PPE55, Rv2042c, leuS, mmpL6, and mmpS6, were selected for SNP analysis, and these genes were chosen based on the previous publications (2,17,34).The sequence for gyrB, PPE55, Rv2042c, and leuS genes were obtained from H37Rv (accession no.NC_00962.3),and mmpL6 and mmpS6 gene sequences were recovered from Mtb-specific deletion one region "TbD1" (accession no.AJ426486.1).The numerical positions for the new and known SNPs in the gyrB, Rv2042c, PPE55, and leuS genes are relative to M. tuberculosis H37Rv reference genome (accession no.NC_000962.3)(35), while the SNP positions within mmpL6 and mmpS6 genes are provided following the TbD1 region (AJ426486.1)(36).All identified SNPs within the six genes are numbered according to gene position (5ʹ to 3ʹ), and the complementary sequences were used for genes Rv2042c and PPE55 (Table 2).To extract the above-mentioned six gene sequences from M. orygis and MTBC species/lineage genomes, a high-throughput analysis was performed using our custom in-house-developed R-scripts in RStudio v.2021.9.2.382 (38) (Supplementary Data S3), which used the Basic Local Alignment Search Tool (BLAST) (39), with the e-value cutoff option set to 10e-100.The scripts interrogated the assembled genomes using a reference gene to identify and extract the sequence corresponding to the SNPs of interest.The NCBI BLAST search was also done for all genes, with any unique sequences investigated.In order to identify M. orygis or other MTBC species-specific new and known SNPs, the extracted gene sequences were aligned using MUSCLE (40), and sequence variations of the unique SNPs were detected using MEGA-X v.10.0.5 (41) and Geneious Prime 2022 (Geneious Prime 2022.0.2, https://www.geneious.com).

The SNP-based phylogenetic reconstruction of M. orygis
To compare M. orygis sequences from the present study with those available in the public repositories, a whole genome phylogeny was analyzed using the single nucleotide variant phylogenomics (SNVPhyl) pipeline v.1.2.3 in IRIDA (23).The snvAlignment file obtained from SNVPhyl was then used to construct a maximum likelihood tree using RAxML v.8.2.12 with the GTRGAMMA model and 100 rapid bootstrap replicates (42).The output phylogenetic tree was visualized using the online tool iTOL (Interactive Tree Of Life) v.6 (43).M. tuberculosis H37Rv (NC_00962.3)was chosen as a reference genome with the following parameters: minimum SNV coverage of 15, SNV abundance ratio of 0.75,  and minimum mapping quality of 30.The SNV density filtering was enabled to remove high-density SNV regions that could suggest possible recombination.The identified SNVs were used in the construction of a phylogeny by calculating the genetic distance between isolates using a generalized time reversible model with PhyML 3.0 (44).

Data collection and genome analysis
In the present study, a total of 61 M. orygis isolates were identified retrospectively from cultures submitted by five different provinces in Canada over a period of 14 years (2009-2022), with more than 90% from British Columbia and Alberta.The cultures were recovered from pulmonary (65%) and extra-pulmonary samples.Forty-five M. orygis cases were female patients (74%), 14 were male patients (23%), and 2 cases did not provide gender on the requisition.The PCR, MIRU-VNTR testing, BioHansel, and Mykrobe Predictor v.0.7.0 (45) analyses identified the isolates as M. africanum or M. africanum/M.pinnipedii or MTBC animal lineages (Table 1).We then performed WGS for all 61 isolates and determined their identification as M. orygis using SNP analysis.The sequencing data were augmented with a total of 56 publicly available M. orygis genome sequences to validate their identification and discrimination from other MTBC species.Of these 56 sequences, 35 were reported from India, followed by 8 from the United States, 5 from Norway, 4 from Switzerland, 2 from the United Kingdom, 1 from the Netherlands, and 1 from Canada (Supplementary Data S2A).

Whole genome-based SNP analysis
We performed the WGS-based SNP analysis for the identification and genetic differentiation of M. orygis from other MTBC members.Several new and known SNPs in the gyrB (2,028 bp), PPE55 (2,405 bp), Rv2042c (798 bp), leuS (2,910 bp), mmpL6 (2,904 bp), and mmpS6 (447 bp) genes were evaluated to ascertain their effectiveness as molecular markers for M. orygis.By gene sequence analysis of multiple sequence alignments from 117 M. orygis genomes (61 studied and 56 public repositories) and 190 sequences of MTBC species and lineages, we identified 10 unique SNPs across six selected genes that can identify and discriminate M. orygis from all members of MTBC.These M. orygis-spe cific SNPs, along with their corresponding genomic positions, mutation type, locus, and function, are listed in Table 2.
A multiple alignment of the gyrB gene showed the presence of two novel nonsy nonymous mutations at positions 277 (A→G) and 1478 (T→C), along with a known synonymous mutation at position 870 (G→A) (Tables 2 and 3).Within the PPE55 gene, our study identified two M. orygis-specific SNPs at positions 2162 (C→T) and 2163 (T→G).The SNP analysis also identified a unique SNP in the Rv2042c gene at position 113 (T→G).In addition, we found two M. orygis-specific SNPs within the leuS gene, one of which was a novel SNP (leuS 1064 A→T) and associated with synonymous mutation, and another one (leuS 1251 G→T) was previously reported (Tables 2 and 4).The sequence of the TbD1 region (AJ426486.1)from the ancestral M. tuberculosis contains the genes mmpS6 and mmpL6.Two more novel SNPs were detected in this region and were found to be associated with a synonymous (mmpL6 486 T→C) and a nonsynonymous (mmpS6 334 C→G) mutation (Tables 2 and 3).The genes mmpL6 and mmpS6 encode membrane proteins belonging to a large and a small family, respectively.The above 10 unique SNPs situated within a set of six genes were identified in all 117 M. orygis strains screened in the present study.
In order to define the discriminatory power of the WGS-based SNP analysis in identifying all MTBC species/lineages, the above 117 M. orygis (Table 1; Supplementary Data S2A) and 190 MTBC (Supplementary Data S2B) genome sequences were also analyzed, and results of representative sequences from each species and lineage are shown in Tables 3 and 4. Apart from M. orygis-specific SNPs, we identified some new and previously reported SNPs that are unique for all MTBC members and lineages.These include 11 SNPs in gyrB, 10 SNPs in leuS, 9 SNPs in PPE55, 6 SNPs in mmpL6, 3 SNPs in Rv2042c, and 2 SNPs in mmpS6 genes.

M. tuberculosis H37Rv / TbD1
a Fourteen SNPs on gyrB, 11 SNPs on PPE55, 4 SNPs on Rv2042c, 7 SNPs on mmpL6, 3 SNPs on mmpS6, and 12 SNPs on leuS genes are shown.The nucleotides in boldface show unique SNPs for a particular species/lineage, and unique SNPs for M. orygis are highlighted in gray."-" indicates that the nucleotides for those positions were not extracted in the analysis.The reference nucleotide positions for the SNPs within gyrB, PPE55, Rv2042c, and leuS genes are relative to M. tuberculosis H37Rv (accession no.NC_000962.3)(35), while the positions within the mmpL6 and mmpS6 genes are provided according to the TbD1 region (accession no.AJ426486.1)(36).
above, we detected SNP markers in the selected six genes that differentiated the strains belonging to different lineages from other MTBC members examined in this study.

Single nucleotide polymorphism-based phylogenetic inference of M. orygis
A whole genome SNP-based phylogenetic tree was built to determine the genetic similarity of the newly sequenced M. orygis isolates in Canada with M. orygis sequen ces from the public repositories (Fig. 1).The SNV distance matrix generated from the whole genome alignments lists the pairwise SNV distances between every sequence (Supplementary Data S4).The genome sequences from this study clustered with M. orygis sequences reported from all five geographic locations.M. orygis sequences from different samples of the same patient clustered together in the phylogenetic tree, as expected.For example, BC isolate 1801225 clustered with 1801535, 1900059 with 1900097, 2001127 with 2001229, 2001439 with 2001480, and 2200196 with 2200263, and AB isolate 2101205 with 2101375 appeared to have an SNV distance of 0-5.The isolates 2001439 and 2001480 also clustered (29-52 SNV distan ces) with two M. orygis sequences from Norway (ERR5336158) and the United States (SRR5642712) and one previously reported M. orygis sequence 51145 (SRR16643349) from Quebec, Canada.Similarly, 11 M. orygis sequences from the present study formed a large cluster with 3 sequences (ERR2659153, ERR2659154, ERR2659156) from Switzer land, 1 sequence (ERR5336157) from Norway, and another sequence (ERR015582) reported from the United Kingdom, with 2 sequences (1800965 from BC and 1900228 from AB) showing a distance of 92-96 SNVs with the M. orygis sequences from Switzer land (Fig. 1; Supplementary Data S4).
The studied isolates 1901282 and 2100725 from AB clustered with 11 M. orygis sequences of animal origin from India and 1 sequence of human origin from the United States with a pairwise SNV distance of 34-88.Interestingly, the isolate 2100725 clustered with seven of these animal-origin sequences (SRR10251185, SRR10251189-92, SRR10251197, and SRR10251200) by 34 SNV distances.However, the SNV numbers of 34-88 in MTBC are quite a distance and are not indicative of spillover or the direction of spillover.

DISCUSSION
The present study is the first to report a greater number of M. orygis from a single geographic area.M. orygis is a causative agent of tuberculosis in both animal and human hosts, and it was first described in 2012 (2) and later by others (3-5, 7, 8).The MTBC species M. bovis has long been believed to be the only agent that causes zoonotic tuberculosis; however, recovering a larger number of M. orygis from both animals and humans in recent years from different areas of the world highlights the need for considering this bacterium as a zoonotic pathogen (46).
Since tuberculosis cases caused by M. orygis are often identified as MTBC or misidenti fied and published as M. africanum or M. bovis, the actual number of infections associ ated with this bacterium may have been underreported (3,16,37).From our culture collection, the investigation showed that since 2009, M. orygis was misidentified as M. africanum.Rahim et al. (18) initially reported M. africanum from four dairy cows that were later identified to be M. orygis by refined analysis (3).A part of the confusion toward this misidentification is that M. orygis shares the gyrB 1207 (G→T) (gyrB 1450 as per accession no.L27512.1)(47) mutations with M. africanum, M. bovis, M. microti, M. caprae, M. mungi, and M. pinnipedii (17) (Table 3).Moreover, MTBC is genetically highly clonal, and thus, without proper identification tools and analysis approaches, the species differentiation could be challenging.
Since currently available laboratory tests are struggling to differentiate animal lineages, in the present investigation, we used WGS-based SNP analysis targeting gyrB, PPE55, Rv2042c, leuS, mmpL6, and mmpS6 genes that can accurately identify M. orygis and unambiguously differentiate it from all members of the MTBC species.The gyrB gene encodes for the β-subunit of the DNA gyrase and has been used as a molecular marker for the identification of MTBC members.The discriminatory power of polymorphisms in the gyrB gene in identifying M. orygis has been evaluated in previous studies (3,17).While these authors described only one unique SNP (gyrB 870 G→A) (gyrB 1113 according to accession no.L27512.1),we detected two more novel and useful genetic markers within the gyrB gene (gyrB 277 A→G and gyrB 1478 T→C) in the screened M. orygis genomes.
The PPE55 (Rv3347c) gene is specific to MTBC species and plays a major role in hostpathogen interactions.Huard et al. (17) earlier reported two M. orygis-specific SNPs in the PPE55 gene (PPE55 2162 C→T and PPE55 2163 T→G), which were also revealed in our study.The results suggest that these SNPs could be used as genetic markers for the identification of M. orygis.We also evaluated the Rv2042c gene for possible unique SNPs to be used as a molecular marker to identify M. orygis.In accordance with van Ingen et al. (2), our SNP analysis identified a nonsynonymous mutation in the 38th codon of the Rv2042c gene at position 113 (T→G).In mycobacterial species, the leuS gene encodes for L-leucyl-tRNA synthetase, which is involved in translation.In this study, in addition to the SNP leuS 1251 (G→T) described by Napier et al. (34), we showed that another novel SNP leuS 1064 (A→T) associated with a nonsynonymous mutation is present in M. orygis strains (Tables 2 and 4).Mtb-specific deletion one region, the TbD1 region, comprises the mmpL6 and mmpS6 genes, which encode for mycobacterial membrane protein families.In modern M. tuberculosis strains, the mmpS6 gene is fully deleted and the mmpL6 gene is trimmed (48).Our result shows two SNPs mmpL6 486 T→C and mmpS6 334 C→G as two novel and distinct genetic markers for the identification of M. orygis.We included the previously described M. orygis-specific polymorphisms in the WGS-based SNP analysis, and they were confirmed in a larger number (61 studied and 56 from public repositories) of M. orygis genomes tested in this work.Thus, the results suggest that WGS-based SNP analysis is a useful tool to rapidly identify M. orygis and to clearly differentiate this emerging pathogen from all MTBC species and lineages.To the best of our knowledge, the unique SNPs gyrB 277 (A→G), gyrB 1478 (T→C), leuS 1064 (A→T), mmpL6 486 (T→C), and mmpS6 334 (C→G) are the first to be reported in the present study.
WGS-based SNP analysis has been employed for the identification of MTBC species (9), and this approach has also become pertinent to MTBC genotyping (36,49).In this study, we further identified the species-specific polymorphisms for members of MTBC species (excluding M. orygis here, as discussed above) and lineages, and results from the gyrB gene confirm the discrimination of several MTBC members, including M. tuberculo sis, M. bovis, M. canettii, M. microti, and M. caprae (Table 3).These results are in agreement with Huard et al. (17), who reported the identification of the above species (along with M. orygis) using a PCR-based SNP analysis.Our study supports the idea that many genomic characteristics could be shared between M. bovis and M. caprae strains, for example, the gyrB mutation at position 513 and the PPE55 mutation at position 556 (17).
Furthermore, we detected unique polymorphisms for lineage 1 in the gyrB 873 gene (G→C); for lineage 2 in the Rv2042c 597 (C→G) gene; for lineage 3 in the PPE55 gene at positions 1173 (G→C), 1177-1178 (A→G), 1179 (C→T), and 1182 (C→G); for lineage 5 in the PPE55 923 (G→T), mmpS6 360 (C→T), and leuS 2736 (T→C) genes; for lineage 7 in the mmpL6 1780 (G→C) and mmpS6 339 (C→T) genes; and for lineage 9 in the leuS 1363 (C→T) gene.The reason for not being able to extract some SNPs in the mmpL6 and mmpS6 genes for M. tuberculosis H37Rv and lineages 1-4 in Tables 3 and 4 is that these genes are partly or fully deleted from the modern M. tuberculosis, as discussed above.Our results indicate that WGS-based SNP analysis could be successfully used to distinguish all members of MTBC species and lineages.Since we tested a limited number of sequences due to availability and data quality, particularly for M. pinnipedii, M. caprae, and M. mungi, further investigation with a large data set would be useful for their specific identification with high confidence.
The use of a whole genome SNP-based phylogenetic tree allowed us to inspect the genetic relationship of M. orygis recovered from Canada and those reported from other geographic regions.The M. orygis isolates of human origin from this study were found to be distributed across the phylogeny (Fig. 1).Phylogenetic analysis of M. orygis genomes from the same patient on the same episode (two isolates from each of the six patients) showed a difference of 0-5 SNVs.
We sought to determine whether M. orygis identified in this study showed a close phylogenetic relationship with animal strains.The result shows that the isolate 2100725 from an AB patient clustered (34 SNVs apart) with several M. orygis sequences of animal origin reported from India (Fig. 1; Supplementary Data S4).This result may be indicative of the adaptation of animal origin M. orygis strains in a human host.The zoonotic or zooanthroponosis potentials of M. orygis have been discussed in a previous work (46).Furthermore, two isolates from BC were related (5-39 SNVs apart) to M. orygis sequences of human origin reported from Norway (ERR5336158) (8) and the United States (SRR5642712) (7) and segregated by 49-52 SNVs from previously published Canadian strain 51145 (50).Thus, the newly sequenced M. orygis from Canada phy logenetically clustered with M. orygis sequences reported from different geographic locations, placing it within the global context.
In conclusion, the WGS analysis in the present study evaluated 10 novel and known unique SNPs within a set of six genes that could be used as molecular genetic markers to accurately identify M. orygis and unambiguously discriminate it from all members of MTBC species and lineages.As WGS technologies are increasingly being used by healthcare systems, our approach will be helpful to the diagnosis and surveillance of M. orygis-associated tuberculosis and optimizing the clinical management of this disease.The analysis of M. orygis sequences will improve our understanding of the molecular characteristics and phylogenetic diversity of this emerging pathogen and its implica tions as a zoonosis.The ever-increasing evidence of M. orygis-linked endemicity and the identification of a greater number of M. orygis from animals and humans around the world highlight the urgency for a multi-sectoral collaboration linking the clinical and veterinary sectors toward a One Health approach.The origin, epidemiology, and transmission dynamics of M. orygis within Canada are currently under investigation.

FIG 1
FIG1 Phylogenetic tree of M. orygis based on SNV analysis.The whole genome phylogeny was analyzed using SNVPhyl pipeline v.1.2.3, and the snvAlignment file was used to infer the tree using RAxML v.8.2.12 with the GTRGAMMA model and 100 rapid bootstrap replicates.The output tree was visualized using (Continued on next page)

FIG 1 (
FIG1 (Continued)    iTOL v.6.A maximum likelihood tree of 61 newly sequenced M. orygis isolates from this study (labeled in red), and the M. orygis genomes collected from public repositories.The left bar adjacent to the tree nodes shows the source of M. orygis (human in purple and animal in orange), and the right bar denotes the geographic location of M. orygis (Canada in green, India in blue, Norway in pink, Switzerland in cyan, the UK in red, and the USA in yellow).M. tuberculosis H37Rv (NC_000962.3)used as a reference in this analysis.The filled circles on nodes represent a bootstrap value between 80 and 100.The studied isolates mostly clustered with M. orygis of human origin recovered from different geographic regions.The isolate 2100725 showed a good phylogenetic relationship (pairwise distances of 34 SNVs) with seven M. orygis of animal origin from India.

TABLE 1
Newly identified 61 M. orygis isolates from this study a (Continued) a PCR, polymerase chain reaction; LLL, left lower lobe; LUL, left upper lobe; RUL, right upper lobe; FNA, fine needle aspiration.

TABLE 3
Summary of SNP analysis for M. orygis differentiation from other MTBC species and lineages ab (36) on gyrB, 11 SNPs on PPE55, 4 SNPs on Rv2042c, 7 SNPs on mmpL6, 3 SNPs on mmpS6, and 12 SNPs on leuS genes are shown.The nucleotides in boldface show unique SNPs for a particular species/lineage, and unique SNPs for M. orygis are highlighted in gray."-"indicates that the nucleotides for those positions were not extracted in the analysis.The reference nucleotide positions for the SNPs within gyrB, PPE55, Rv2042c, and leuS genes are relative to M. tuberculosis H37Rv (accession no.NC_000962.3)(35),whilethepositions within the mmpL6 and mmpS6 genes are provided according to the TbD1 region (accession no.AJ426486.1)(36).
The gene sequence analysis identified an SNP (mmpL6 306 C→T) that can differentiate M. africanum lineage 6 from other MTBC members and lineages.While the SNPs in the gyrB gene at position 1167 (C→T) and in leuS at position 1632 (G→A) were found to discriminate against M. bovis and M. bovis BCG from all others, the SNP PPE55 1701 A→G could even separate M. bovis from BCG.Furthermore, M. bovis and BCG shared SNPs gyrB 513 G→A and PPE55 556 C→A with M. caprae strains (Tables

TABLE 4
Summary of SNP analysis for M. orygis differentiation from other MTBC species and lineages a